The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
translated by 谷歌翻译
Using geometric landmarks like lines and planes can increase navigation accuracy and decrease map storage requirements compared to commonly-used LiDAR point cloud maps. However, landmark-based registration for applications like loop closure detection is challenging because a reliable initial guess is not available. Global landmark matching has been investigated in the literature, but these methods typically use ad hoc representations of 3D line and plane landmarks that are not invariant to large viewpoint changes, resulting in incorrect matches and high registration error. To address this issue, we adopt the affine Grassmannian manifold to represent 3D lines and planes and prove that the distance between two landmarks is invariant to rotation and translation if a shift operation is performed before applying the Grassmannian metric. This invariance property enables the use of our graph-based data association framework for identifying landmark matches that can subsequently be used for registration in the least-squares sense. Evaluated on a challenging landmark matching and registration task using publicly-available LiDAR datasets, our approach yields a 1.7x and 3.5x improvement in successful registrations compared to methods that use viewpoint-dependent centroid and "closest point" representations, respectively.
translated by 谷歌翻译
With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions. However, there has been no work in generating explanations on the fly during model training and utilizing them to improve the expressive power of the underlying GNN models. In this work, we introduce a novel explanation-directed neural message passing framework for GNNs, EXPASS (EXplainable message PASSing), which aggregates only embeddings from nodes and edges identified as important by a GNN explanation method. EXPASS can be used with any existing GNN architecture and subgraph-optimizing explainer to learn accurate graph embeddings. We theoretically show that EXPASS alleviates the oversmoothing problem in GNNs by slowing the layer wise loss of Dirichlet energy and that the embedding difference between the vanilla message passing and EXPASS framework can be upper bounded by the difference of their respective model weights. Our empirical results show that graph embeddings learned using EXPASS improve the predictive performance and alleviate the oversmoothing problems of GNNs, opening up new frontiers in graph machine learning to develop explanation-based training frameworks.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
A classical result in learning theory shows the equivalence of PAC learnability of binary hypothesis classes and the finiteness of VC dimension. Extending this to the multiclass setting was an open problem, which was settled in a recent breakthrough result characterizing multiclass PAC learnability via the DS dimension introduced earlier by Daniely and Shalev-Shwartz. In this work we consider list PAC learning where the goal is to output a list of $k$ predictions. List learning algorithms have been developed in several settings before and indeed, list learning played an important role in the recent characterization of multiclass learnability. In this work we ask: when is it possible to $k$-list learn a hypothesis class? We completely characterize $k$-list learnability in terms of a generalization of DS dimension that we call the $k$-DS dimension. Generalizing the recent characterization of multiclass learnability, we show that a hypothesis class is $k$-list learnable if and only if the $k$-DS dimension is finite.
translated by 谷歌翻译
多养殖养殖具有环境优势,但比单一养殖需要更修剪。我们介绍用于自动修剪的新型硬件和算法。自主系统使用高架摄像头从物理规模的花园测试床中收集数据,利用学识渊博的植物表型卷积神经网络和边界磁盘跟踪算法来评估单个植物分布并每天估算花园的状态。从这个花园状态下,Alphagardensim选择植物自主修剪。训练有素的神经网络检测并靶向工厂上的特定修发点。实验评估了两种与农业机器人龙门系统兼容的定制设计的修剪工具,并通过受控算法进行了自主削减。我们提出了四个60天的花园周期的结果。结果表明,该系统可以自主实现0.94个归一化的植物多样性,并在修剪剪切的同时保持平均冠层覆盖率为0.84,到周期结束时。有关代码,视频和数据集,请参见https://sites.google.com/berkeley.edu/pruningpolyculture。
translated by 谷歌翻译
由于事后解释越来越多地用于了解图神经网络(GNN)的行为,因此评估GNN解释的质量和可靠性至关重要。但是,评估GNN解释的质量是具有挑战性的,因为现有的图形数据集对给定任务没有或不可靠的基础真相解释。在这里,我们介绍了一个合成图数据生成器ShapeGgen,该生成可以生成各种基准数据集(例如,不同的图形大小,度分布,同粒细胞与异性图)以及伴随着地面真相解释。此外,生成各种合成数据集和相应的基础真相解释的灵活性使我们能够模仿各种现实世界应用程序生成的数据。我们将ShapeGgen和几个现实图形数据集包括在开源图形图库GraphXai中。除了带有基础真相说明的合成和现实图形数据集外,GraphXAI还提供数据加载程序,数据处理功能,可视化器,GNN模型实现和评估指标,以基准基准GNN解释性方法的性能。
translated by 谷歌翻译
对话推荐系统(CRS)的注意力日益增长,该系统可作为基于对话和建议的以任务为基础的工具,以提供感兴趣的项目并探索用户偏好。但是,CRS中现有的工作未能向用户明确显示推理逻辑,并且整个CRS仍然是黑匣子。因此,我们提出了一个基于生成对话代理的解释,以解释他们为何采取行动的解释,提出了一个名为“解释建议”(EGCR)的新颖端到端框架。 EGCR结合了用户评论,以增强项目表示并提高整个对话的信息。据我们所知,这是对现实世界数据集上可解释的对话建议的第一个框架。此外,我们在一个基准的对话推荐数据集上评估了EGCR,并比其他最先进的模型在建议准确性和对话质量上获得更好的性能。最后,广泛的实验表明,生成的解释不仅具有高质量和解释性,而且使CRS更加值得信赖。我们将使我们的代码可为CRS社区做出贡献
translated by 谷歌翻译
随着推荐系统变得越来越复杂和复杂,它们通常会缺乏公平和透明度。为建议提供强大而公正的解释,人们越来越关注,因为它可以帮助解决这些问题并提高推荐系统的信任度和信息性。然而,尽管事实是为人类生成了这种解释,这些人类对具有适当情绪的信息做出更强烈反应,但在为建议解释时,人们缺乏对情绪的考虑。发现当前的解释生成模型可以夸大某些情绪,而无需准确捕获基本的语调或含义。在本文中,我们提出了一种基于多头变压器的新方法,称为“情感感知变压器”,以解释推荐(情感者),以产生更健壮,公平和情感增强的解释。为了衡量产生的解释的语言质量和情感公平性,我们采用自动文本指标和人类的看法进行评估。在具有多个评估指标的三个广泛使用基准数据集上进行的实验表明,情感者在文本质量,解释性和对情感分布的公平性方面始终优于现有的最新解释生成模型。 Emoter的实施将作为开源工具包发布,以支持进一步的研究。
translated by 谷歌翻译
房地产图像标签是节省手动注释并增强用户体验的努力的重要用例之一。本文提出了针对房地产图像分类问题的端到端管道(称为重新调用)。我们使用Custom InceptionV3体系结构提出了两阶段的转移学习方法,将图像分为不同类别(即卧室,浴室,厨房,阳台,厅等)。最后,我们以REST API为托管的REST API发布,该应用程序是在2枚GB RAM上运行的Web应用程序。演示视频可在此处使用。
translated by 谷歌翻译